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Abstract. Previous deforestation and supercompilation algorithms may introduce acci- 
dental termination when applied to call-by-value programs. This hides looping bugs from 
the programmer, and changes the behavior of a program depending on whether it is op- 
timized or not. We present a supercompilation algorithm for a higher-order call-by-value 
language and prove that the algorithm both terminates and preserves termination proper- 
ties. This algorithm utilizes strictness information to decide whether to substitute or not 
and compares favorably with previous call-by-name transformations. 



1. Introduction 

Intermediate data structures such as lists allow functional programmers to write clear 
and concise programs, but carry a cost at run-time since additional heap cells need to be 



both allocated and garbage collected. Both deforestation [STJ and supercompilation |47[| are 
automatic program transformations which remove many of these intermediate structures. 
In a call- by- value context these transformations are unsound, however, as they might hide 
infinite recursion from the programmer. Consider the program 

{>^x.y){facz). 

This program could loop, if the value of z is negative. Applying Wadler's deforestation 
algorithm to the program will result in y, which is sound under call- by-name or call- by- 
need. Under call-by-value the non-termination in the original program has been removed, 
and hence the meaning of the program has been altered by the transformation. 

This is unfortunate since removing intermediate structures in a call-by-value language is 
perhaps even more important than in a lazy language since the entire intermediate structure 
has to remain in memory during the computation. 

Ohori and Sasano [35|] saw this need and presented a very elegant algorithm for call- 
by-value languages that removes intermediate structures. Their algorithm sacrifices some 
transformational power for algorithmic simplicity. In this article we explore a different 
part of the design space: a more powerful transformation at the cost of some algorithmic 
complexity. The outcome is a meaning-preserving supercompiler for pure call-by-value 
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languages in general, together measurements from an implementation in a compiler for 



Timber [3j], a pure call-by- value language. 

Our current work is a necessary first step towards supercompiling impure call-by-value 
languages, of which there are many available today. Well-known examples are OCaml j25l |, 
Standard ML [29| and F^ [49|. Considering that F# is currently being turned into a 
product, it is quite likely that strict functional languages will be even more popular in the 
future. 

One might think that our result should be easy to obtain by modifying a call-by-name 
algorithm to simply delay beta reduction until every function argument has been specialized 
to a value. However, it turns out that this strategy misses even simple opportunities to 
remove intermediate structures. That is, eager specialization of function arguments risks 
destroying fold opportunities that might otherwise appear, something which may prohibit 
complexity improvements to the resulting program. 

The novelty of our supercompilation algorithm is that it concentrates all call-by-value 
dependencies to a single rule that relies on the result from a separate strictness analysis for 
correct behavior. In effect, our transformation delays transformation of function arguments 
past inlining, much like a call-by-name scheme does, although only as far as allowed by 
call-by- value semantics. The result is an algorithm that is able to improve a wide range 
of illustrative examples like the existing algorithms do, but without the risk of introducing 
artificial termination. 

The specific contributions of our work are: 

• We provide an algorithm for positive supercompilation including folding, for a strict and 
pure higher-order functional language (Section 2]). 

• We prove that the algorithm terminates and preserves the semantics of the program 
(Section [5|). 

• We show preliminary benchmarks from an implementation in the Timber compiler (Sec- 
tion [6]). 

We start out with some examples in Section [2] to give the reader an intuitive feel of 
how the algorithm behaves. Our language of study is defined in Section [3l right before the 
technical contributions are presented. 



This article is an extended and improved version of a paper presented at POPL 2009 [20|] . 
As well as clarifying a number of the examples and proofs, we give an improved formulation 
of 'DappO presented in Section 14.11 and make a small change to how let-expressions are 
handled by the driving algorithm. 

2. Examples 



Wadler 57[ uses the example append (append xs ys) zs and shows that his deforestation 
algorithm transforms the program so that it saves one traversal of the first list, thereby 
reducing the complexity from 2|xs| + \ys\ to \xs\ + \ys\. 

If we naively change Wadler's algorithm to call-by- value semantics by eagerly attempt- 
ing to transform arguments before attacking the body, we do not achieve this improvement. 
The definition for append is: 

append xs ys = case xs of 

[] -^ ys 

{x' : xs') — )■ x' : append xs' ys 
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and we give an example of a hypothetical call-by-value variant of Wadler's deforestation 
algorithm that attacks arguments first: 

append (append xs' ys') zs' 

Inlining the body of the inner append and then pushing down the outer call into 
each branch gives 

case xs' of 

[] — >■ append ys' zs' 

(x : xs) — 7- append (x : append xs ys') zs' 

Transformation of the first branch will create a new function hi that is isomorphic 
to append, and call it. The second branch contains an embedding of the initial 
expression and blindly transforming it will lead to non-termination of the transfor- 
mation algorithm. One must therefore split this expression in two parts: the subex- 
pression X : append xs ys' which we call /12, and the outer expression append z zs' 
where z is fresh. Continuing with x : append xs ys' and inlining append gives 

X : case xs of 

[] ^ ys' 

(x' : xs') -^ x' : append xs' ys' 

The second branch contains a renaming of the expression we named /12, so we 
simply replace it with a call to /12 • Moving back to append z zs' we notice that this 
expression is a renaming of the one called hi , so we replace it with the call hi z zs' 

Assembling the pieces gives us the final result: 
letrec hi xs ys = case xs of 

[] -^ ys 

(x' : xs') — ;■ x' : hi xs' ys 
h2 X xs ys = X : case xs of 

D -^ ys 

(x' : xs') — )• /i2 x' xs' ys 
in case xs' of 

[] —7- hi ys' zs' 

{x : xs) — ;■ hi (hi X xs ys') zs' 

Notice that the intermediate structure from the input program is still there after the trans- 
formation, and the complexity is still 2|xs| + \ys\\ This can be compared to how the same 
example is transformed by Wadler's algorithm as shown in Figure [H The reason our hy- 
pothetical call-by-value algorithm failed to improve the program is that it had to split 
expressions too early during the transformation, thereby preventing fold opportunities that 
occur in a call- by-name setting. 

However, changing the call-by-value algorithm to do the exact opposite — that is, 
carefully delaying the transformation of arguments to a function past the inlining of its 
body, but only as far as strictness allows — actually leads to the same result that Wadler 
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append {append xs' ys') zs' 

Naming the first expression hi and inlining both occurrences of append gives 
case ( case xs' of 

[] -^ ys' 

{xi : xsi) -^ xi : append xsi ys') of 

[] ^ zs' 

(x : xs) — 7- X : append xs zs' 

Pushing down the outer case-expression into both branches of the inner one 
and reducing the resulting case-expression of a known constructor leads to 

case xs' of 

[] — )■ case ys' of 

[] ^ ^s' 

{x : xs) —7- x : append xs zs' 
{xi : xsi) — > xi : append {append xsi ys') zs' 

Transform each branch separately. Transformation of the second branch in 
the first branch will create a new function /12 that is isomorphic to append, 
and the second branch of the outer case is a renaming of our initial expression 
called hi. Assembling all pieces yields the following result: 

letrec hi xs ys zs = case xs of 

[] -^ case ys of 

[] ^ zs 

{y' : ys') -^ y' ■ h^, ys' zs 
{x' : xs') — ?• x' : hi xs' ys zs 
/i2 xs ys = case xs of 

[] ^ ys 

{x' : xs') — )• x' : /i2 xs' ys 
in hi xs' ys' zs' 

Figure 1: Wadler's algorithm transforming append (append xs' ys') zs' 

obtains with append {append xs ys) zs. This is a key observation for obtaining deforestation 
under call-by-value without altering the semantics, and our transformation exploits it. 

Except for the fundamental reliance on strictness analysis, which is necessary to pre- 
serve semantics, our transformation shares many of its rules with Wadler's algorithm. The 
transformation that is commonly referred to as case-of-case is crucial for our transforma- 
tion, just like it is for a call-by-name algorithm. The case-of-case transformation is useful 
when a case-expression appears in the head of another case-expression, in which case the 
outer case context is duplicated and pushed into all branches of the inner case-expression. 
Our transformation also contains rules that correspond to ordinary evaluation which elim- 
inate case-expressions that have a known constructor in their head or adds two primitive 
numbers. The mechanism that ensures termination basically looks for "similar" terms to 
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ones that have already been transformed, and if a similar term is encountered, the trans- 
formation will stop and split the term into smaller terms that are transformed separately. 
The remaining rules of our transformation simply shifts focus to the proper subexpression 
and ensures that the algorithm does not get stuck. 

We claim that our transformation compares favorably with previous call-by-name trans- 
formations, and we now proceed with demonstrating the transformation on some common 
examples. The results of the transformation on these examples are identical to the results 
of Wadler's algorithm [571]. 

This does not hold in general, a counter-example is the transformation of the expres- 
sion zip (map f xs) (map g ys) where Wadler's algorithm will eliminate both intermediate 
structures and our transformation will only eliminate the first intermediate structure. 

Our first example is transformation of sum (m,ap square ys), where the referenced func- 
tions are defined as: 



square x 


= X * X 


map f xs 


= case xs of 




[] -^ ys 




(x : xs) -^ f X : map f xs 


sum xs 


= case xs of 




D ^ 




{x : xs) ^ X + sum xs 



We start our transformation by allocating a new fresh function name ho to the expression 
sum (map square ys), inlining the body of sum, and substituting m,ap square ys into the body 
of sum: 

case m,ap square ys of 
[] ^0 
(x' : xs') -^ x' + sum, xs' 

After inlining map and substituting the arguments into the body the result becomes: 
case ( case ys of 

[] ^ D 

{x' : xs') -^ {square x') : m,ap square xs') of 
[] ^0 
{x' : xs') -^ x' + sum, xs' 

We duplicate the outer case in each of the inner case branches, using the expression 
in the branches as head of that case-expression. Continuing the transformation on each 
branch with ordinary reduction steps yields: 

case ys of 

[] ^0 

{x' : xs') -^ square x' + sum, {map square xs') 

At this point we inline the body of the first square occurrence and observe that the 
second parameter to (-I-) is similar to the expression we started with and therefore we 
replace it with Iiq xs' . The result of our transformation is h^ ys, with Hq defined as: 

Hq ys = case ys of 

[] ^0 

{x' : xs') -^ x' * x' + Hq xs' 
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This new function only traverses its input once, and no intermediate structures are 
created. If the expression sum (map square xs) or a renaming of it is detected elsewhere in 
the input, a call to Hq will be inserted instead. 

The work by Ohori and Sasano {35|] cannot fuse two successive applications of the same 
function, nor mutually recursive functions. We show in the next two examples that our 
transformation can handle these cases. We need the following new function definitions: 

m,apsq xs = case xs of 

D ^ D 

{x' : xs') -^ (x' * x') : m,apsq xs' 
f xs = case xs of 

D ^ D 

{x' : xs') -^ {2* x') : g xs' 
g xs = case xs of 

D ^ D 

{x' : xs') -^ (3 * x') : / xs' 

Transforming m,apsq (mapsq xs) will inline the outer m,apsq, substitute the argument 
in the function body and inline the inner call to mapsq: 

case ( case xs of 

^ 

{x' : xs') -^ (x' * x') : m,apsq xs') of 

[] ^ [] 

{x' : xs') — )■ {x' * x') : mapsq xs' 

As previously, we duplicate the outer case in each of the inner case branches, using the 
expression in the branches as head of that case-expression. Continuing the transformation 
on each branch by ordinary reduction steps yields: 

case xs of 

[] ^ [] 

(x' : xs') -^ {x' * x' * x' * x') : m,apsq {m.apsq xs') 

Here we encounter a similar expression to what we started with, and create a new 
function hi . The final result of our transformation is hi xs , with the new residual function 
hi that only traverses its input once defined as: 

hi xs = case xs of 

[] ^ D 

{x' : xs') -^ (x' * x' * x' * x') : hi xs' 

For an example of transforming mutually recursive functions, consider the transforma- 
tion of sum, (/ xs). Inlining the body of sum,, substituting its arguments in the function 
body and inlining the body of / yields: 

case ( case xs of 

[] ^ D 

{x' : xs') -^ (2* x') : g xs') of 

D ^0 

(x' : xs') -^ x' + sum xs' 

We now move down the outer case into each branch, and perform reductions until we end 
up with: 
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case xs of 

[] ^0 

(x' : xs') — )• (2 * x') + sum (g xs') 

We notice that unlike in previous examples, sum {g xs') is not similar to what we 
started transforming and we can therefore continue the transformation. For space reasons, 
we focus on the transformation of the rightmost expression in the last branch, sum {g xs'), 
while keeping the functions already seen in mind. We inline the body of sum,, perform the 
substitution of its arguments and inline the body of g: 

case ( case xs' of 

[] ^ [] 

[x" : xs") ^ (3 * x") : f xs") of 
[] ^0 
{x' : xs') -^ x' + sum xs' 

We now move down the outer case into each branch, and perform reductions: 
case xs' of 

[] ^0 

{x" : xs") ^ (3 * x") + sum (/ xs") 

We notice a familiar expression in sum, (/ xs"), and fold when reaching it. Combining 
the fragments together gives a new function /12: 

/i2 xs = case xs of 

[] ^ 

(x' : xs') -^ {2 * x') + case xs' of 

[] ^0 

(x" : xs") -^ (3 * x") + /i2 xs" 

The new function /i2 consumes a list and returns a number, so our algorithm has 
eliminated the intermediate list between / and sum,. 

Kort [23] studied a ray-tracer written in Haskell, and identified a critical function in 
the innermost loop of a matrix multiplication, called vecDot: 

vecDot xs ys = sum, {zipWith {*) xs ys) 

This is simplified by our positive supercompiler to: 

vecDot xs ys = hi xs ys 

hi xs ys = case xs of 

{x' : xs') -^ case ys of 

{y' : ys') -^ x' * y' + hi xs' ys' 

_ -^ 
_ -^ 

The intermediate list between sum, and zip With is transformed away, and the complexity 

is reduced from 2|xs| + \ys\ to \xs\ + \ys\ (since this is matrix multiplication \xs\ = \ys\). 
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Expressions 



e, / ::= n\x\g\fe\ Xx.e \ ke \ ei © 62 | casee of {pj ^^ ej} 
I let X = / in e I letrec g = vine 



p ::= n \ kx 

Values 



V ::= n \ Xx.e \ kv 

Figure 2: The language 

fv{x) = {x} 

fv{n) = 

fv{g)_ = _ 

fv{k e) = fv{e) 

fv{Xx.e) = fv{e)\{x} 

Hfe) = fvif)Ufv{e) 

fv{letx = emf) = fv{e) U {fv{f)\{x}) 

fv{letrec g = vinf) = fv{v) U fv{f) 

fv{caseeoi{pi^ ei}) = fv{e) U {[J{fv{ei)\fv{pi))) 

fv{ei © 62) = fv{ei) yjfv{e2) 

Figure 3: Free variables of an expression 

3. Language 

Our language of study is a strict, higher-order functional language with let-bindings 
and case-expressions. Its syntax for expressions, values and patterns is shown in Figure [2j 

Here we let variables and constructor symbols be denoted by x and /c, respectively. The 
constructor symbols k range over a set K and we also assume that there is a separate set Q 
of recursively defined function symbols, ranged over by g. In what follows we will assume 
that the meaning of such symbols is given by a recursive map G mapping symbols g to their 
defined value. 

The language contains integer values n and arithmetic operations ©, although these 
meta-variables can preferably be understood as ranging over primitive values in general and 
arbitrary operations on these. We let -|- denote the semantic meaning of ©. 

A list of expressions ei . . . e„ is abbreviated as e, and a list of variables xi . . . x^ as x. 

We denote the free variables of an expression e by /y(e), as defined in Figure El Along 
the same lines we denote the function names in an expression e as /n(e), defined in Figured! 

We encode letrec as an application containing fix, where fix is defined as 

fix = Xf.f {Xn.fixfn) 
Definition 3.1. Letrec is defined as: 

letrec /i = Xx.eiae' = {Xh.e') (Xy. fix {Xh. Xx.e) y) 
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fn{x) = 

fn{n) = 

H9)_ = {9}_ 

fn{k e) = fn{e) 

fn{\x.e) = fn{e) 

Hfe) = /n(/)U/n(e) 

/n(let X = e in /) = /n(e) U fn{f) 

fn{letTecg = vmf) = {fn{v) U fn{f))\{g} 

fn{case e oi {pi ^ ei}) = fn{e) U {[J{fn{ei)) 

/n(ei e 62) = /n(ei) U/n(e2) 

Figure 4: Function names of an expression 



Reduction contexts 



S ::= D I £"6 I {Xx.e)£ \ k£ \ £ (B e \ n(B £ \ case £^ of {pj — )• e,} | letx = £'ine 

Evaluation relation 

£{g) ^ £{v),if{g,v) eG (Global) 

£{{Xx.e)v) ^ £{[v/x]e) (App) 

£{\etx = v'm.e) i-> £{[v/x]e) (Let) 

£ {case kv o{ {kiXi — > ei})i-> £'([l;/xj]ej), if A; = kj (KCase) 

£^(casenof {?ij — )• ej}) 1— > £{ej),iin = nj (NCase) 

<?(ni 77-2) i-^ £'(n), if n = ni + 712 (Arith) 

Figure 5: Reduction semantics 

By defining letrec as syntactic sugar for other primitives we introduce an implicit re- 
quirement that the right hand side of letrec expressions must not contain any free variables 
except h. This is not a limitation since functions that contain free variables can be lambda 
lifted [r?] to the top level. 

A program is an expression with no free variables and all function names defined in G. 
The intended operational semantics is given in Figure O where \e/x]e' is the capture- free 
substitution of expressions e for variables x in e'. 

A reduction context £" is a term containing a single hole, D, which indicates the next 
expression to be reduced. The expression £{e) is the term obtained by replacing the hole 
in £ with e. £ denotes a list of terms with just a single hole, evaluated from left to right. 

If a variable appears no more than once in a term, that term is said to be linear with 
respect to that variable. Like Wadler [571], we extend the definition slightly for linear case- 
expressions: no variable may appear in both the head and a branch, although a variable 
may appear in more than one branch. For example, the definition of append is linear is 
linear with respect to ys, although ys appears in both branches. 
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4. Higher Order Positive Supercompilation 

It is time to make the intuition developed in Section [2] more formal. Our supercompiler 
is defined as a set of rewrite rules that pattern-match on expressions. This algorithm is called 
the driving algorithm, and is defined in Figure [H Three additional parameters appear as 
subscripts to the rewrite rules: a driving context 7^, the set of global function definitions 
G and a memoization list p. The memoization list holds information about expressions 
already traversed and is explained more in detail in Section [4.11 The driving context TZ is 
smaller than £, and is defined as follows: 

7^ ::= D | 7^e | case7^ of fe ^ e^} | 7^ e | e 7^ 

Interestingly, this definition coincides with the evaluation contexts for a call-by-name lan- 
guage. The reason our transformation still preserves a call-by-value semantics is that beta 
reduction (rule R9) results in a let-binding, whose further specialization in rule R13 depends 
on whether the body expression / is strict in the bound variable x or not. 

Our let-rule (R13) might change the order of computations, but since non-termination 
is commutative this does not matter in practice. Supercompiling impure languages requires 
stronger conditions for the let-rule, since expressions might contain effects other than non- 
termination. The difficulty of supercompiling an impure language is to find sufficient con- 
ditions that preserve soundness while still allowing the maximum amount of reordering of 
expressions. 

In principle, an expression e is strict with regards to a variable x if evaluation of e 
eventually requires the value of x; in other words, if e i— >■ . . . i— t- £{x). Such information 
is not computable in general, although call-by-value semantics allows for reasonably tight 
approximations. One such approximation is given in Figure [71 where the strict variables of 
an expression e are defined as all free variables of e except those that only appear under a 
lambda or not inside all branches of a case. 

There is an ordering between the driving rules; i.e., all rules must be tried in the order 
they appear. Rule RIO is the default fallback case for applications and rule R19 is the 
default fallback case for case expressions. These rules extend the driving context 7^ and 
zoom in on the next expression to be driven. The program is turned "inside-out" by moving 
the surrounding context 7^ into all branches of the case-expression through rules R15 and 
R18. Rule R13 has a similar mechanism for let-expressions. Notice how the context is 
moved out of the recursive call in rule R5, whereas rule R7 recursively applies the driving 
algorithm to the full new term TZ{n), forcing a re-traversal of the new term in search for for 
further reduction opportunities. Rule R12 is only allowed to match if the variable y is not 
freshly generated by the splitting mechanism described in the next section. Meta-variable 
a in rules R8 and R18 stands for an "annoying" expression; i.e., an expression that would 
be further reducible were it not for a free variable getting in the way. The grammar for 
annoying expressions is: 

a ::= x \ n®a \ a@n \ a®a \ ae 

Some expressions should be handled differently depending on context. If a constructor 
application appears in an empty context, there is not much we can do but to drive the 
argument expressions (rule R4) . On the other hand - if the application occurs at the head 
of a case-expression, we may choose a branch on basis of the constructor and leave the 
arguments unevaluated in the hope of finding fold opportunities further down the syntax 
tree (rule R16). 



POSITIVE SUPERCOMPILATION 



11 



T^Mn,G,p 

Vim® n2\n,G,p 
^[eiee2k,G,p 



Vl{\x.f)eU^G,p 
Vlee'U,G,p 
Ppetx = n'mfJTz^cp 
Ppetx = yinfJTz^cp 
Vlletx = emf}Ti^G,p 



Ppetrecflf = v'meJTi,G,p 
P[casex oi{pi -^ eijj-ji^cp 
Vlcasekjeo{{kiXi -^ eijj-jz^cp 



P [case rij of{nj 
P [case a o{{pi - 
P[casee oi{pi - 

neU,G,p 



-^ ei}j-R,G,p 
ei}JTi,G,p 
ei}jn,G,p 



7^(n) (Rl) 

n{x) (R2) 

'^app{g)n,G,p (R3) 

knna,G,p (R4) 

n{xVleJa,G,p) (R5) 

(Ax.P[eln,G,p) (R6) 

D[[7?.(n)]n,G,p, where n = ni + n2 (R7) 

I^Ieiln,G,p e Ple2ln,G,p, if ei es = a (R8) 
1^h2]n(ei(sa),G,p, if ei = n or ei = a 
^[eil7^(nee2),G,p, otherwise 

Vlletx = emfU,G,p (R9) 

^H7e{ne'),G,p (RIO) 

Vln{[n/x]f)UG,p (Rii) 
P|7e([y/x]/)]ln,G,p, if y not freshly generated (R12) 

VpZ{[e/x]f)ja,G,p, if a; G stnct{f) and (R13) 

X € linear{f) 
letx = P[e]ln,G,p in^[[^(/)ln,G,p, otherwise 

^I^(e)lD,G',p, where G' = G U{g, v) (R14) 

case a; of{pi -^ T^l[Pi/x]'R{ei)ja,G,p} (R15) 

P[7^(let5;,• = eineJ)I□,G,p (R16) 

nnej)UG,p (R17) 

casePHn,G,pOffe ^ P[7^(e,)l□,G,p} (R18) 

'^Mn{caseaoi{p,^e,}),G,p (^^9) 

7^(e) (R20) 



Figure 6: Driving algorithm 



strict{x) = {x} 

strict{n) = 

strict{g) = 

strict{ke) = strict{e) 

strict{Xx.e) = 

strict{f e) = strict{f) U strict{e) 

strict{let x = einf) = strict{e) U {strict{f)\{x}) 

strictiXetvec g = vinf) = strict{f) 

strict{case e oi {pi — > ej})= strict{e) U (P|(sinci(ei)\/u(j)i))) 

strict{ei © 62) = strict{ei) U strict{e2) 

Figure 7: The strict variables of an expression 



The argumentation is analogous for lambda abstractions: if there is a surrounding 
application context we perform a beta reduction, otherwise we proceed by driving the 
abstraction itself. 

Notice that the primitive operations ranged over by © cannot be unfolded and trans- 
formed like ordinary functions can. If the arguments of a primitive operation are annoying, 
our transformation will simply leave the primitive operation in place (rule R8). 



12 P. A. JONSSON 



T^app{g)n,G,p = hx ii 3{h, ei) e p . crei = TZ{g) (1) 

where x = a{fv{ei)) 
'Dappig)n,G,p = T^ig} if 3{h,ei)£p.ei < n{g) and n{g) < ei (2) 

Vapp{g)n,G,p =_[nna,G,p/ymf9}a,G,p ii3{h,ei)£p.ei<n{g) (3) 

where {fg,f,y) = split{n{g) , ei) 

'Dappig)n,G,p = [^[/ln,G,p/y]^[/Jn,G,p ii 3ei € e . ei < n{g) (4a) 

letrec h = Xx.e inhx if h € fn{e) (4b) 

e otherwise (4c) 
where {g,v) € G, 

e = Vln{v)Ja,G,p', 

p' = pU{h,n{g)), 

h fresh, 

x=lvin{g)), 

{fg,f,y) = split{n{g) , ei) 

Figure 8: Driving of appHcations 

If we had a perfect strictness analysis and could decide whether an arbitrary expression 
will terminate or not, the only difference in results between our transformation and a call- 
by-name counterpart would be for the non-terminating cases. In practice, we have to settle 
for an approximation, such as the simple analysis defined in Figure [71 One might speculate 
whether the transformations thus missed will have adverse effects on the usefulness of our 
transformation in practice. We believe we have seen clear indications that this is not the 
case, and that the crucial factor is the ability to inline function bodies irrespective of whether 
arguments are values or not. 

Our transformation always inlines functions unless the algorithm detects a risk of non- 
termination. Supero [3Q,, Sec. 3.2] has a more advanced inlining strategy. 

4.1. Application Rule. In the driving algorithm rule R3 refers to I'appO, defined in Fig- 
ure [H T>app{) can be inlined in the definition of the driving algorithm, it is merely given 
a separate name to improve the clarity of the presentation. Figure [8] contains some new 
notation: we use a for a variable to variable substitution and = for syntactic equivalence 
of expressions. 

Care needs to be taken to ensure that recursive functions are not inlined forever. The 
driving algorithm keeps track of previously seen function applications in the memoization 
list p, which also associates a unique function name to each such expression. Whenever an 
expression that is equivalent up to renaming of variables to a previous application, a call 
to the associated function symbol is inserted instead. This is not sufficient to guarantee 
termination of the algorithm, but the mechanism is crucial for the complexity improvements 
mentioned in Section [2j 

To ensure termination, we use the homeomorphic embedding relation < to define a 
predicate called "the whistle". When the predicate holds for an expression we say that 
the whistle blows on that expression. The intuition is that when e < f, f contains all 
subexpressions of e, possibly embedded in other expressions. For any infinite sequence 
eo,ei, . . . there must exist an i and a j such that i < j and e^ < Cj. This condition is 
sufficient to ensure termination. 
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e f tg 01 02 

e < Just e X [e/a;] [Just e/x] 

Right e < Right {e, e') Right x [e/x] [{e,e')/x] 

fac y < fac {y - 1) fac x [y/x] [{y - l)/x\ 

Figure 9: Examples of the homeomorphic embedding and the msg 

In order to define the homeomorphic embedding we need a definition of uniform terms 
analogous to the one defined by S0rensen and Gliick [45|. We slightly adjust their version 
to fit our language. 

Definition 4.1 (Uniform terms). Let s range over the set Q iJ K iJ {caseof, let,letrec, 
primop, lambda, apply}, and let caseof (e), let(e), letrec(U, e),primop(e), lambda(e), 
and apply(e) denote a case, let, recursive let, primitive operation, lambda abstraction or 
application for all subexpressions e, e and v. The set of terms T is the smallest set of arity 
respecting symbol applications s(e). 

Definition 4.2 (Homeomorphic embedding). Define < as the smallest relation on T satis- 
fying: 

e < /j for some i ei < /i, . . . , e„ < /„ 

e < s(/i,...,/n) s(ei,...,e„) < s(/i,...,/„) 

Whenever the whistle blows, our transformation splits the input expression into strictly 
smaller terms that are driven separately in the empty context. This might expose new 
folding opportunities, and allows the algorithm to remove intermediate structures in subex- 
pressions. The design follows the positive supercompilation algorithm outlined by S0rensen 



44l | , except that we need to reassemble the transformed subexpressions into a term of the 
original form instead of pulling them out as let-definitions, in order to preserve strictness. 
Our transformation is also more complicated because we perform the program extraction 
immediately, rather than constructing a large tree of terms and extracting the program in 
a separate pass. 

Splitting expressions is rather intricate, and two mechanisms are needed; the first is the 
most specific generalization [msg). 
Definition 4.3 (Most specific generalization). 

• An instance of a term e is a term of the form 9e for some substitution 6. 

• A generalization of two terms e and /is a triple {tg, ^i, ^2)) where 0i, 62 are substitutions 
such that Oitg = e and ^2^9 = /• 

• A most specific generalization (msg) of two terms e and / is a generalization {tg,6i,92) 
such that for every other generalization {t'6'1,6'2) of e and /it holds that tg is an instance 

on'g. 

For background information and an algorithm to compute most specific generalizations, 
see Lassez et al. [23] . Figure [9] contains examples of the homeomorphic embedding and the 
msg. 

The most specific generalization is not always sufficient to split expressions. For ex- 
pressions differing already in their roots, msg will return just a variable and substitutions 
equal to the input terms on that variable. If this happens we need to split expressions in a 
different way. We therefore define our function split using two alternatives; one that applies 
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when there is a non-trivial most specific generahzation, and one that just spHts along the 
spine of the first term in the other case. 

Definition 4.4 (Split). For t € T we define split{ti,t2) by: 
split{s{ei),s'{e2)) = {tg,rng{6i), dom{6i)) if s = s' 
= (s(x),ei,x) otherwise 

with {tg, 61,92) = msg{s{ei),s'{e2)) and x fresh. 

Alternatives 2 and 4a of DappO is for upwards generalization, and alternative 3 is 
for downwards generalization. This is exemplified below. All the examples of how our 
transformation works in Section [2] eventually terminate through a combination of alternative 
1 and alternative 4b of T>app{)- 

The second alternative of "DappO ™ combination with 4a is useful when transforming 
function calls that have the same parameter appearing twice, for example append xs xs as 
shown in Figure [TOl 

The third alternative is used when terms are "growing" in some sense. An example 
of reverse with an accumulating parameter is shown in Figure [TTl assuming the standard 
definition of reverse. 

5. Correctness 

The problem with using previous deforestation and supercompilation algorithms in a 
call-by- value context is that they might change the termination properties of programs. In 
this section we prove that our supercompiler both terminates itself, and preserves program 
termination behavior for all input. 

5.1. Termination. In order to prove that the algorithm terminates we show that each 
recursive application of PJ] in the right-hand sides of Figure [6] and [8] has a strictly smaller 
weight than the left-hand side. 

The weight of an expression is one plus the sum of the weight of its subexpressions, 
where variables, primitive numbers and function names have weight two. The weight of a 
fresh variable not in the initial input is one. 

Definition 5.1. The weight of a variable x in the initial input, a primitive number n, and 
a function name g is 2. The weight of a fresh variable not in the initial input is 1. The 
weight of any composite expression (n > 1) is |s(ei, . . . ,6^)1 = 1 -|- Yl^=i \^i\- 

Definition 5.2. Let 5 be a set with a relation <. Then (5", <) is a quasi-order if < is 
reflexive and transitive. 

Definition 5.3. Let {S, <) be a quasi-order. (S, <) is a well-quasi-order if, for every infinite 
sequence sq, si, . . . G S, there exist i < j such that Sj < Sj 

The following lemma tells us that the set of finite sequences over a well-quasi-ordered 
set is well-quasi-ordered, with one proof by Nash- Williams |32l |: 

Lemma 5.4 (Higman's lemma). // a set S is well- quasi- ordered, then the set S* of finite 
sequences over S is well- quasi- ordered. 
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Vlappend xs xsj (*) 

(By rule 4 of VappO, put (/iq, append xs xs) in p and transform according 
to the rules of the algorithm) 

= case xs of 

[] -^ xs 

{x' : xs') — )■ I^la:;']] : Vlappend xs' xs} 

(Focus on PJ append xs' xs} and recall that p contains append xs xs 
so alternative 2 of T>app{ ) is triggered and the transformation returns 
append xs' xs. This returns all the way up to the start (*) and the trans- 
formation continues there through alternative 4a) 

= Vlappend xs xs\ 

(Generalize the expression with append xs' xs) 

= \p\_xs\/x, V\_xs\/y\ Vlappend x y\ 

= [xs/x, xs/y] case x of 

[] ^ y 

[x' : xs') -^ Vlx'} : V\append xs' y\ 
= [xs/x, xs/y] case x of 

[] ^ y 

(x' : xs') -^ x' : Hq xs' y 

= letrec Hq xs ys = case xs of 

[] -^ ys 

{x' : xs') -^ x' : ho xs' ys 
in Hq xs xs 

Figure 10: Example of upwards generalization 

The weight of the entire transformation is a triple that contains the maximum length 
of the memoization list p denoted by N, the weight of the term being transformed and the 
weight of the current term in focus. That such an N exists follows from Kruskal's Tree 
Theorem [8] and the homeomorphic embedding relation being a well-quasi-order. 

Theorem 5.5 (Kruskal's Tree Theorem). If S is a finite set of function symbols, then any 
infinite sequence ti,t2, ■ ■ ■ of terms from the set S contains two terms ti and tj with i < j 
such that ti < tj. 

Proof (Similar to Dershowitz [^]). Collapse all integers to a single 0-ary constructor, and 
all variables to a different 0-ary constructor. 

Suppose the theorem were false. Let the infinite sequence t = ti,t2, . . . of terms be 
a minimal counterexample, measured by the size of the tj. By the minimality hypothesis. 
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Vlrev xs []| 

(By rule 4 of VappQ, put (/iq, rev xs []) in p and transform the program 
according to the rules of the algorithm) 

case xs of 

{x' : xs') — )■ Vlrev xs' {x' : [])|| 

(Focus on the second branch and recall that p contains rev xs [] so alter- 
native 3 of T>app ( ) is triggered and the expression is generalized) 

V\rev xs' {x' : [])]] 

(Generalize the expression with rev xs []) 

[V[{x' : ^)}/zs]V\revxs' zs} 

[{x' : \\)/ zs]'D\rev xs' zs\ 

(Put (/ii, rev xs' zs) in p and transform according to the rules of the 
algorithm) 

[(x' : [])/2;s]letrec hi xs ys = case xs of 

[] -^ ys 

(x' : xs') —7- hi xs' (x' : ys) 
in hi xs' {x' : []) 

letrec hi xs ys = case xs of 

[] -^ ys 

{x' : xs') — )■ hi xs' {x' : ys) 
in hi xs' (x' : []) 

(Putting the two parts together) 

case xs of 

{x' : xs') —7- letrec hi xs ys = case xs of 

[] -^ ys 

{x' : xs') -^ hi xs' {x' : ys) 
in hi xs' {x' : []) 

Figure 11: Example of downwards generalization 
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the set of proper subterms of the ti must be weh-quasi-ordered, or else there would be a 
smaller counterexample ti,t2, • • • , iz-i) ^i, S2, ■ ■ ■, for some / such that si is a subterm of ti 
and all S2, ■ ■ ■ are subterms of one of ti,ti^i, .... (None of ti, ^2, • • • i ^z-i can embed any of 
si,S2, ■ ■ ■, since that would mean that tj also is embedded in some tj,i < I < j). 

Since the set S of function symbols is well-quasi-ordered by >, there must exist an infi- 
nite subsequence r oft, the root (outermost) symbols of which constitute a quasi-ascending 
chain under <. (Any infinite sequence of elements of a well-quasi-ordered set must contain 
an infinite chain of quasi- ascending elements). Since the set of proper subterms is well- 
quasi-ordered, it follows by Lemma 15.41 that the set of finite sequences consisting of the 
immediate subterms of the elements in r is also well-quasi-ordered. But then there would 
have to be an embedding in t itself, in which case it would not be a counterexample. □ 

We will show that each step of the driving algorithm will reduce the weight of what is 
being transformed. The constant N in the weight is the maximum length of the sequence 
of terms that are not related to each other by the homeomorphic embedding. 

Corollary 5.6. Any infinite sequence ti,t2, ■ ■ ■ G T* contains two terms ti and tj with i < j 
such that ti < tj. 

Corollary 5.7. There is a maximum N such that ti,t2, . . . jt^ £ T* contains no terms ti 
and tj with i < j and ti < tj . 

We define the weight of driving a term as: 

Definition 5.8. The weight of a call to the driving algorithm is l^'JejT^^cpl = {N — 
\p\^ I^(e)IJel) 

Tuples must be ordered for us to tell whether the weight of a term actually decreases 
from driving it. We use the standard lexical order between tuples. 

Definition 5.9. The order between two tuples (n.i,n2,n3) and (m-i, 7712,^3) is: 
(ni, 71-2, na) < (mi, 7712,7773) if ni < mi 

(rZl, 772, 713) < (^771, 77l2, 7713) if ?7i = mi and 712 < 7?72 

(711,712,773) < (7771,7712,7713) if 77i = TTii, 772 = "^2 and 773 < 7773 

We also need to show that the memoization list p only contains elements that were in 
the initial input program: 

Lemma 5.10. The second component of the memoization list p, can only contain terms 
from the set T. 

Proof. Integers and fresh variables are equal, up to <, to the already existing integers and 
variables. Our only concern are the rules that introduce new terms that are not in T. The 
new function names h are the only new terms introduced by the algorithm. By inspection 
of the rules it is clear that only rule R3 introduces such new terms. Inspection of the RHS 

of T)appi),G,p'- 

1: No recursive application in the RHS. 

2: No recursive application in the RHS. 

3: No new terms are created and the memoization list p is not extended. 
4a: No new terms are created and the memoization list p is not extended. 
4b: The newly created term hx\s kept outside of the recursive call of the driving algorithm. 
The memoization list /9, is extended with terms from T. 
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4c: No new terms are created, and the memoization list p, is extended with terms from T. 

D 

With these definitions in place, we can formulate a lemma that claims the weight is 
decreasing in each step of our transformation. 

Lemma 5.11. For each rule ^[ejT^^cp = ei in Figure\Ei and Figure\M and each recursive 
application T>le'}n',G,p' in ei, \'Dle'}TZ',G,p'\ < \T^lejn,G,p\ 

Lemma 5.12 (Totality). For all expressions TZ{e), T^lejii^cp i^ matched by a unique rule 
in Figure\^ 

Theorem 5.13 (Termination). The driving algorithm D[[]] terminates for all inputs. 

Proof. The weight of the transformation is defined because of Kruskal's Tree Theorem and 
the fact that the homeomorphic embedding is a well-quasi-order. Lemma 15.101 guarantees 
that the memoization list p only contains terms from the initial input. By Lemma 15.111 the 
weight of the transformation decreases for each step and by Lemma 15.121 we know that each 
recursive application will match a rule. 

Since < is well-founded over triples of natural numbers the system will eventually 
terminate. □ 



5.2. Total Correctness. The problem with previous deforestation and supercompilation 
algorithms in a call-by-value context is that they might change termination properties of 
programs. We prove that our supercompiler does not change what the program computes, 
nor does it alter whether a program terminates or not. 

Sands [391 shows how a transformation can change the semantics in rather subtle ways - 
consider the function 

/ a; = X + 42 

It is clear that / = 42 (where = is semantic equivalence with respect to the current 
definition). Using this equality and replacing 42 in the function body with / yields: 

/x = X + /O 

This function will compute something entirely different than the original definition of 
/. We need some tools to ensure that the meaning of the original program is preserved and 
we therefore introduce the standard notions of operational approximation and equivalence. 
A general context C which is an expression with zero or more holes in the place of some 
subexpressions is used, and we say that an expression C[e] is closed if there are no free 
variables in it. 
Definition 5.14 (Operational Approximation and Equivalence). 

• e operationally approximates e', eCe', if for all contexts C such that C[e], C[e'] are 
closed, if evaluation of C[e] terminates then so does evaluation of C[e']. 

• e is operationally equivalent to e', e = e', if e IZ e' and e' C e 

The correctness of deforestation in a call-by-name setting has previously been shown 
by Sands [39] using his improvement theory. We use Sands's definitions for improvement 
and strong improvement: 
Definition 5.15 (Improvement, Strong Improvement). 
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• e is improved by e', e > e', if for all contexts C such that C[e], C[e'] are closed, if 
computation of C[e] terminates using n function calls, then computation of C[e'] also 
terminates, and uses no more than n function calls. 

• e is strongly improved by e', e >s e', iff e ^ e' and e = e' . 

Note that improvement, ^, is not the same as the homeomorphic embedding, <, defined 
previously. 

We use e i— >-^ v to denote that e evaluates to v using k function calls (and any other 
reduction rule as many times as it needs) and e' \-^-^ v' to denote that e' evaluates to v' 
with at most k function calls and using any other reduction rule as many times as needed. 

To state the Improvement Theorem we view a transformation as the introduction of 
some new functions from a given set of definitions. We let {gi}i^j be a set of functions 
indexed by some set /, where each function has a fixed arity Oi and are given by some 
definitions 

{gi = Xxi ...Xai-ei}i(zi 

and let {e^}jg/ be a set of expressions such that for each i G I,fv{e[) C {xi . . .Xq,.}. The 
following results relate to the transformation of the functions gi using the expressions e^: 
let {/ij}jg/ be a set of new functions given by the definitions 

{hi = ^/g]\xi . . . Xa^.e'ijiei 

Theorem 5.16 (Sands Improvement theorem). If g = e and e [> C[g] then g ^ h where 
h = C[h]. 

Theorem 5.17 (Cost-equivalence theorem). If ei <^ e[ for all i ^ I, then gi <> hi, i ^ I . 

We need a standard partial correctness result [39y associated with unfold-fold transfor- 
mations 

Theorem 5.18 (Partial Correctness). // e^ = e[ for all i ^ I then hi \Z gi, i (z I. 

which we combine with Theorem 15. 161 to get total correctness for a transformation: 

Corollary 5.19. // we have ei ^s e^ for all i ^ I , then gi \>s hi, i & I. 

Improvement theory in a call-by-value setting requires Sands operational metatheory 



for functional languages 41[, where the improvement theory is a simple corollary over the 
well-founded resource structure (N, 0, +,>). For simplicity of presentation we instantiate 
Sands's theorems to our language. We use = to denote expressions equal up to renaming 
of bound variables and borrow a set of improvement laws that will be useful for our proof: 

Lemma 5.20 (Sands [40]). Improvement laws 

(1) Ife>e' thenC[e] >C[e']. 

(2) Ife = e' thene > e' . 

(3) If e\>e' and e' > e" then e \> e" 

(4) Ife^e' then e > e' . 

(5) If e>e' then eCe'. 

It is sometimes convenient to show that two expressions are related by showing that 
what they evaluate to is related. 



Lemma 5.21 (Sands [39[). If ei ^ e'^ and 62 M^'' e'2 then (e[ <> e'a 44> ei <> 62^. 
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We need to show strong improvement in order to prove total correctness. Since strong 
improvement is improvement in one direction and operational approximation in the other 
direction, a set of approximation laws that correspond to the improvement laws in Lemma 
15.201 is necessary. 

Lemma 5.22. Approximation laws 

(1) Ife^e' thenC[e]nC[e']. 

(2) Ife = e' thene^e'. 

(3) Ife^e' and e' D e" then e Zl e" 

(4) If e^ e' then eDe'. 

Combining Lemma [5.20l and Lemma fS. 221 gives us the final tools we need to prove strong 
improvement: 

Lemma 5.23. Strong Improvement laws 

(1) Ife>se' thenC[e] >s C[e']. 

(2) Ife = e' thene>se'. 

(3) Ife >s e' and e' >s e" then e >s e" 

(4) Ife>-^e' then e >s e' . 

If two expressions are improvements of each other, they are considered cost equivalent. 
Cost equivalence also implies strong improvement, which will be useful in many parts of 
our proof of total correctness for our supercompiler. 

Definition 5.24 (Cost equivalence). The expressions e and e' are cost equivalent, e <> e' 
iff e > e' and e' > e 

A local form of the improvement theorem which deals with local expression-level recur- 
sion expressed with a fixed-point combinator or with a letrec definition is necessary. This 
is analogous to the work by Sands ^], with slight modifications for call-by- value. 

We need to relate local recursion expressed using fix and the recursive definitions which 
the improvement theorem is defined for. This is solved by a technical lemma that relates 
the cost of terms on a certain form to their recursive counterparts. 

Theorem 5.25. For all expressions e, if Xg.e is closed, then fix{\g.e) <\> h, where h is 
a new function defined by h = [\n.hn/g]e. 

Proof (Similar to Sands tSyj]). Define a helper function h^ = [Xn.fix (Xg.e) n/g]e. Since 
fix (Xg.e) ^^ {Xf.f {Xn.fix f n)) {Xg.e) ^ {Xg.e) { Xn.fi x {Xg.e) n) ^ [Xn.fix{Xg.e)n/g]e 
and h~ \-^^ [Xn.fix {Xg.e) n/g]e it follows by Lemma [5.211 that fix {Xg.e) <> h^ . Since cost 
equivalence is a congruence relation we have that [Xn.h~ n/g]e <^ [Xn.fix {Xg.e) n/g]e, 
and so by Theorem I5.17| we have a cost-equivalent transformation from h~ to h, where 
h = [h/h^][Xn.h^ n/g]e = [Xn.hn/g]e. □ 

We state some simple properties that will be useful for proving our local improvement 
theorem 

Theorem 5.26. Consequences of the letrec definition 
i): letrec /i = Ax.eine' <> [Xn.fix{Xh.Xx.e)n/h]e' 
ii): letrec /i = Xx.einh <> Xn.fix {Xh.Xx.e) n 
iii): letrec /i = Ax.eine' <> [letrec /i = Xx. e in h/h]e' 
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Proof. For i), expand the definition of letrec in the LHS, (A/i.e') {\n.fix{\h.Xx.e)n) and 
evaluate it one step to [Xn.fix (Xh.Xx.e) n/h]e' . This is syntactically equivalent to the RHS, 
hence cost equivalent. For ii), set e' = h and perform the substitution from i). For iii), use 
the RHS of ii) in the substitution and notice it is equivalent to i). □ 

This allows us to state the local version of the improvement theorem: 

Theorem 5.27 (Local improvement theorem). // variables h and x include all the free 
variables of both eo and e\, then if 

letrec h = Xx.eQ in cq ^s letrec h = Xx.eQ in ei 

then for all expressions e 

letrec h = Xx.eQ in e ^^ letrec h = Xx.ei in e 

Proof. Define a new function (7 = [\n.gn/h\\x.eQ. By Proposition [5?25] q <> fix (Xh.Xx.eo). 
Use this, the congruence properties, and the properties listed in Proposition 15.261 to trans- 
form the premise of the theorem: 

letrec h = Xx.eo in cq ^s letrec h = Xx.eo in ei 

[Xn.fix{Xh.Xx.eo)n/h]eQ >s [Xn.f ix (Xh. Xx.eo) n/h]ei 

Xx.[Xn. fix {Xh. Xx.eo) n/h]eo ^s Xx.[Xn.f ix (Xh. Xx.eo) n/h]ei 

[Xn.fix {Xh. Xx.eo) n/h]Xx.eo ^s [Xn.f ix {Xh. Xx.eo) n/h]Xx.ei 

[Xn.gn/h]Xx.eo >s [Xn.gn/h]Xx.ei 

So bv Corollarv [5.19l a >« a' where a' = [g' /g][Xn.gn/h]Xx.ei = [Xn.gn/h]Xx.ei. Hence 
by Proposition 15.251 g' <> fix (Xh.Xx.ei). Adding it all together yields fix {Xh. Xx.eo) ^^ 
9 ^s g' ^^ fix {Xh.Xx.ei). From the transitivity and congruence properties of improvement 
we can deduce that Xn. fix {Xh. Xx.eo) ^s Xn.f ix {Xh.Xx.ei). By Proposition 15.261 we get 
letrec h = Xx.eo in/i ^s letrec h = Xx.ei in/i, which can be further expanded by congruency 
properties of improvement to [letrec h = Xx.eo in/i//i]e >s [letrec h = Xx.ei in/i//i]e. Using 
Proposition 15.261 one more time yields letrec /i = Ax.eoine ^s letrec /i = Xx.ei'me, which 
proves our theorem. □ 

This allows us to state the total correctness theorem for our transformation: 

Theorem 5.28 (Total Correctness). Let TZ{e) be an expression, G a recursive map, and p 
an environment such that 

• the range of p contains only closed expressions, and 

• fv{TZ{e)) n dom{p) = 0, and 

thenUie) >s p{HeU,G,p)- 

The proof is in Appendix lA.il to Appendix I A. 201 
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6. Benchmarks 

In this section we provide measurements on a set of common examples from the htera- 
ture on deforestation and perform a detailed analysis for each example. We show that our 
positive supercompiler removes intermediate structures and can improve the performance 
by an order of magnitude for certain benchmarks. The supercompiler was implemented as a 



pass in the Timber compiler [34i |. Timber is a pure functional call-by- value language which 
is very close to the language we describe in Section [3l and for the scope of this article it 
can be thought of as a strict variant of Haskell. We have left out the full details of the 
instrumentation of the run-time system but it is available in a separate report [l9| . 

All measurements were performed on an idle machine running in an xterm terminal 
environment. Each test was run 10 consecutive times and the best result was selected 
because the programs are deterministic and the best result must appear under the minimum 
of other activity. The number of allocations and the total allocation sizes remained constant 
over all runs. 

Raw data for the time and size measurements before and after supercompilation are 
shown in Table [U and allocation measures in Table [2j Compilation times are shown in Table 
[3l The time column is the number of clock ticks obtained from the RDTSC instruction 
available on Intel/ AMD processors, and the binary size is in bytes. The total number of 
allocations and the total memory size in bytes allocated by the program are displayed in 
their respective column. The compilation times are measured in seconds and times from 
left to right are for producing an object file, producing an executable binary, and the 
corresponding operations with supercompilation turned on. 

Binary sizes are slightly increased by the supercompiler, but all run-times are faster. 
The main reason for the performance improvement is the removal of intermediate structures, 
reducing the number of memory allocations. Compilation times are increased by 10-15% 
when enabling the supercompiler. 

The supercompiled results on these particular benchmarks are identical to the results 



reported in previous work for call-by-name languages by Wadler [57| and S0rensen et al. 



43]. We do not provide any execution-time comparisons with these, though, since for 
identical intermediate representations after supercompilation, such measurements would 
only illustrate differences caused by back-end implementation techniques. 

The work on Supero by Mitchell and Runciman [30|] shows that there remain open 
problems when supercompiling large Haskell programs. These problems are mainly related 
to speed, both of the compiler and of the transformed program. When profiling Supero, 
Mitchell and Runciman found that a majority of the time was spent on their homeomorphic 
embedding test. Our transformation performs the corresponding test on a smaller part of 
the abstract syntax tree, so there is reason to believe that this will result in less time spent 
on testing homeomorphic embedding even on large programs for our transformation. The 
complexity of the homeomorphic embedding relation has been investigated by Narendran 



and Stillman [3l|], and they give an algorithm of complexity 0{size{e) x size{f)) for deciding 
whether e < f. We expect essentially the same problems that Mitchell and Runciman 
observed to appear in a call-by-value context as well, and intend to investigate them now 
that we have a theoretical foundation for our transformation. 
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Time Binary size 

Benchmark Before After Before After 



Double Append 105,844,704 


89,820,912 


89,484 


90,800 


Factorial 21,552 


21,024 


88,968 


88,968 


Flip a Tree 2,131,188 


237,168 


95,452 


104,704 


Sum of Squares of a Tree 276,102,012 


28,737,648 


95,452 


104,912 


Kort's Raytracer 12,050,880 


7,969,224 


91,968 


91,460 



Table 1: Time and size measurements 



Allocations Alloc Size 

Benchmark Before After Before After 



Double Append 270,035 


180,032 


2,160,280 


1,440,256 


Factorial 9 


9 


68 


68 


Flip a Tree 20,504 


57 


180,480 


620 


Sum of Squares of a Tree 4,194,338 


91 


29,360,496 


908 


Kort's Raytracer 60,021 


17 


320,144 


124 



Table 2: Allocation measurements 



Not Supercompiled Supercompiled 
Benchmark -c -make -c -S -make -S 



Double Append 0.183 


0.300 


0.202 


0.319 


Factorial 0.095 


0.213 


0.097 


0.216 


Flip a Tree 0.211 


0.223 


0.230 


0.347 


Sum of Squares of a Tree 0.214 


0.332 


0.234 


0.349 


Kort's Raytracer 0.239 


0.359 


0.278 


0.399 



Table 3: Compilation times 

6.1. Double Append. As previously seen, supercompiling the appending of three lists 
saves one traversal over the first list. This is an example by Wadler [571 ] . and the intermediate 
structure is fused away by our supercompiler. The program is: 

append xs ys = case xs of 

[] -^ ys 

{x' : xs') — )■ x' : {append xs' ys) 

main xs ys zs = append (append xs ys) zs 

Supercompiling this program gives the same result that we obtained manually in Section (2) 

hi xsi ysi zsi = case xsi of 

[] -^ case ysi of 

[] -^ zsi 

[y'l ■ ys'i) -^ y[ ■■ {hi ys'^ zsi) 

(x{ : xs() -^ x[ : {hi xs( ysi zsi) 
ha XS2 ys2 = case xs2 of 

[] -^ ys2 

{x2 : xs'2) -^ x'2 : (/i2 XS2 ys2) 
main xs ys zs =hi xs ys zs 
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In this measurement, three strings of 9000 characters each were appended to each other 
into a 27 000 character string. As can be seen in Table [21 the number of allocations goes 
down as one iteration over the first string is avoided. The binary size increases 1316 bytes, 
on a binary of roughly 90k. 

6.2. Factorial. There are no intermediate lists created in a standard implementation of 
a factorial function, so any performance improvements must come from inlining or static 
reductions. 

facO =1 

fac n =n * fac (n — 1) 

main =show (fac 3) 

The program is transformed to: 

hO =1 

h n =n * h {n — 1) 

main =show (3 * /i 2) 

One recursion and a couple of reductions are eliminated, thereby slightly reducing the 
run-time. The allocations remain the same and the final binary size remains unchanged. 

6.3. Flip a Tree. Flipping a tree is another example by Wadler [57'], and just like Wadler 
we perform a double flip (thus restoring the original tree) before printing the total sum of 
all leaves. 

data Tree a = Leaf a \ Branch (Tree a) [Tree a) 

sumtr {Leaf a) = a 

sumtr (Branch I r) = sumtr I + sumtr r 

flip {Leaf x) = Leaf x 

flip {Branch I r) = Branch {flip r) {flip I) 

main xs = let ys = {flip {flip xs)) in show {sumtr ys) 

This is transformed into: 

h t = case t of 

Leaf d ^>- d 

Branch I r -^ {hi) + {h r) 

main xs = show { case xs of 

Leaf d ^ d 

Branch I r ^ {hi) + {h r)) 

A binary tree of depth 12 was used in the measurement. The function h is isomorphic 
to sumtr in the input program, and the double flip has been eliminated. Both the total 
number of allocations and the total size of allocations is reduced. The run-time is reduced 
by an order of magnitude. The binary size increases by about 10%, though. 
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6.4. Sum of Squares of a Tree. Computing the sum of the squares of the data members 
of a tree is the final example by Wadler [571]. 

data Tree a = Leaf a | Branch {Tree a) {Tree a) 

square :: Int -^ Int 
square x = x * x 

sumtr {Leaf x) = x 

sumtr {Branch I r) = sumtr I + sumtr r 

squaretr {Leaf x) = Leaf {square x) 

squaretr {Branch I r) = Branch {squaretr I) {squaretr r) 

main xs = show {sumtr {squaretr xs)) 
This is transformed to: 

h t = case t of 

Leaf d ^ d * d 

Branch I r — t- {hi) + {h r) 

main xs = show { case xs of 

Leaf d ^ d * d 

Branch I r ^ {hi) + {h r) 

Almost all allocations are removed by our supercompiler, but the binary size is increased 
by nearly 10%. The run-time is improved by an order of magnitude. 



6.5. Kort's Raytracer. The inner loop of a raytracer '2z] written in Haskell is extracted 
and transformed. 

zipWith f {x : xs) {y : ys) = {f x y) : zipWith f xs ys 
zip With = [] 

sum :: [Int] -^ Int 

sum [] = 

sum {x : xs) = x + sum xs 

main xs ys = sum {zipWith {*) xs ys) 

The transformed result is: 

h xs ys = case xs of 

{x' : xs') -^ case ys of 

{y' : ys') -^ {x' * y') + {h xs' ys') 

_ -^ 
_ -^ 

main xs ys = h xs ys 

The total run-time, the number of allocations, the total size of allocations and the 
binary size all decrease. 
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7. Related Work 

There is much hterature concerning algorithms that remove intermediate structures 
in functional programs. However, most of these works are in the the context of call-by- 
name or call- by-need languages, which makes the task of supercompilation a different, yet 
difficult, problem. We therefore start our survey of related work with one call-by-value 
transformation and then look at the related transformations from a call-by-name or call- 
by-need perspective. 

7.1. Lightweight Fusion. Ohori's and Sasano's Lightweight Fusion [35] works by pro- 
moting functions through the fix-point operator and guarantees termination by limiting 
inlining to at most once per function. They implement their transformation in a compiler 
for a variant of Standard ML and present some benchmarks. The algorithm is proven cor- 
rect for a call-by-name language. It is explicitly mentioned that their goal is to extend the 
transformation to work for an impure call-by-value functional language. 

Comparing lightweight fusion to our positive supercompiler is somewhat difficult, the 
algorithms themselves are not very similar. Comparing results of the algorithms is more 
straightforward - the restriction to only inline functions once makes lightweight fusion un- 
able to handle successive applications of the same function or mutually recursive functions, 
something the positive supercompiler handles gracefully. 

Despite the early stage of their work, Ohori and Sasano are proposing an interesting 
approach that appears quite powerful. 



7.2. Deforestation. Deforestation was pioneered by Wadler [57[ for a first order language 
more than fifteen years ago. The function macros supported by the initial deforestation 
algorithm were not capable of fully emulating higher-order functions. 



Marlow and Wadler [27|] addre ssed t he first-order restriction in a subsequent article [27|] . 



This work was refined in Marlow's |l995l | dissertation, where he also related deforestation to 
the cut-elimination principle of logic. Chin [5,] has also generalised Wadler's deforestation 
to higher-order functional programs by using syntactic properties to decide which terms 
that can be fused. 



Both Hamilton [ij] and Marlow [28] have proven that their deforestation algorithms 
terminate. More recent work by Hamilton [13] extends deforestation with a treeless form 
that is easy to recognise and handles a wide range of functions, giving more transparency 
for the programmer. 

Alimarine and Smetsers [2] have improved the producer and consumer analyses in Chin's 



19941 ] algorithm to be based on semantics rather than syntax. They show that their algo- 
rithm can remove much of the overhead introduced by generic programming [16|] . 

While these works are algorithmically rather close to ours due to the close relationship 
between deforestation and positive supercompilation, it supposes either a call-by-name or 
call- by- need context, and is thus not applicable to the kind of languages we target. 



7.3. Supercompilation. Closely related to deforestation is supercompilation [52l . l53l . |54 



551 ]. Supercompilation both removes intermediate structures and achieves partial evaluation, 
as well as some other optimisations. In partial evaluation terminology, the decision of when 
to inline is taken online. The initial studies on supercompilation were for the functional 
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language Refal [56|]. The super compiler Scp4 ^] is implemented in Refal and is the most 
well-known implementation from this line of work. 

The positive supercompiler [47| is a variant which only propagates positive information 
such as inferred equalities between terms. The propagation is done by unification and 
the work highlights how similar deforestation and positive supercompilation really are. 
Narrowing-driven partial evaluation [3|, ll|] is the functional logic programming equivalent of 
positive supercompilation but formulated as a term rewriting system. Their approach also 
deals with non-determinism from backtracking, which makes the corresponding algorithms 
more complicated. 

Strengthening the information propagation mechanism to propagate not only positive, 
but also negative information, yields perfect supercompilation 43, |43|]. Negative information 



is the opposite of positive information, namely inequalities. These inequalities can be used 
to prune case-expression branches known not t^ be applicable, for example. 

More recently, Mitchell and Runciman [30] have worked on supercompiling Haskell. 
They report run-time reductions of up to 55% when their supercompiler is used in conjunc- 
tion with GHC. 

Supercompilation has seen applications beyond program optimization: verification of 
cache coherence protocols [20] and proving term equivalence [21!] are two examples. We 
do not believe that our supercompiler is useful for these applications since it is inherently 
weaker than the corresponding supercompiler with call-by-name semantics. 

The positive supercompiler by S0rensen et al. [43] is the immediate ancestor of our 
work, although we have extended it to a higher-order language and converted it to work 
correctly for call-by-value languages. 

7.4. Generalized Partial Computation. GPC [9|, [50] uses a theorem prover to extract 
additional properties about the program being specialized. Among these properties are the 
logical structure of a program, axioms for abstract data types, and algebraic properties of 
primitive functions. 

The theorem prover is applied whenever a test is encountered, in order to determine 
which subset of the execution branches can actually be taken. Information about the 
predicate that was tested is propagated along the branches that are left in the resulting 
program. The reason GPC is such a powerful transformation is because it assumes the 
unlimited power of a theorem prover. 

Futamura et al. [lOt] have applied GPC in a call- by- value setting in a system called WS- 
DFU (Waseda Simplify-Distribute- Fold-Unfold), and report many successful experiments 
where optimal or near optimal residual programs are produced. It is unclear whether WS- 
DFU preserves termination behavior or if it is a call-by-name transformation applied to a 
call-by-value language. 

We note that the rules for the first order language presented by Takano [50] are very 
similar to the positive supercompiler, but the requirement for a theorem prover might 
exclude the technique as a candidate for automatic compiler optimisations. The lack of 
termination guarantees for the transformation might be another obstacle. Considering the 
similarities between GPC and positive supercompilation it should be straightforward to 
convert GPC to a call- by- value setting. 
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7.5. Other Transformations. Considering the vast amount of research conducted on pro- 
gram transformations in general, we only briefly survey other related transformations. 



7.5.1. Partial Evaluation. Partial evaluation [ls| is another instance of Burstall and Dar- 
lington's [1977] informal class of fold/unfold transformations. 

If partial evaluation is performed offline, the process is guided by program annotations 
that tell when to fold, unfold, instantiate and define functions. Binding-Time Analysis 
(BTA) is a program analysis that annotates operations in the input program based on 
whether they are statically known or not. 

Partial evaluation does not remove intermediate structures, something we deem neces- 
sary to enable the programmer to write programs in the clear and concise listful style. Both 
deforestation and supercompilation simulate call-by-name evaluation in the transformer, 



whereas partial evaluation simulates call-by-value. It is suggested by S0rensen et al. [46|] 
that this might affect the strength of the transformation. 



7.5.2. Short Cut Fusion. Short cut deforestation [1^, Il3( takes a different approach to de- 
forestation, sacrificing some generality by only working on lists. 

The idea is that the constructors Nil and Cons can be replaced by a foldr consumer, 
and a special function build is used to enable the transformation to recognize the producer 
and enforce a type requirement. Lists using build/foldr can easily be removed with the 
foldr/build rule: 

foldr f c {build g) = g f c 

It is the responsibility of the programmer or compiler writer to make sure list-traversing 
functions are written using build and foldr, thereby cluttering the code with information for 
the optimiser and making it harder to read and understand for humans. 

Gill im plem ented and measured short cut deforestation in GHC using the nofib bench- 
mark suite [37|. Around a dozen benchmarks improved by more than 5%, the average was 
3% and only one example got noticeably worse, by 1%. Heap allocations were reduced, by 
half in one particular case. 

The main argument for short cut deforestation is its simplicity on the compiler side 
compared to full-blown deforestation. GHC currently contains a variant of the short cut 
deforestation implemented using rewrite rules [38|. 



Takano and Meijer 5l|] generalized short cut deforestation to work for any algebraic 



datatype through the acid rain theorem. Ghani and Johann ll|] have also generalized 
the foldr/build rule to a fold/superbuild rule that can eliminate intermediate structures of 
inductive types without disturbing the contexts in which they are situated. 

Launchbury and Sheard (2j] worked on automatically transforming programs into suit- 
able form for shortcut deforestation. Onoue et al. [36] showed an implementation of the 
acid rain theorem for Gofer where they could automatically transform recursive functions 
into a form suitable for shortcut fusion. 

Chitil [g] used type-inference to transform the producer of lists into the abstracted form 
required by short cut deforestation. Given a type-inference algorithm which infers the most 
general type, Chitil is able to determine the list constructors that need to be replaced in 
one pass. 

From the principal type property of the type inference algorithm Chitil was also able to 
deduce completeness of the list abstraction algorithm. This completeness guarantees that 
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if a list can be abstracted from a producer by abstracting its list constructors, then the list 
abstraction algorithm will do so. 

The implications of the completeness of the list abstraction algorithm is that a foldr 
consumer can be fused with nearly any producer. One reason list constructors might not 
be abstractable from a producer is that they do not occur in the producer expression but 
in the definition of a function which is called by the producer. A worker /wrapper scheme 
proposed by Chitil ensures that these list constructors are moved to the producer in order 
to make list abstraction possible. 

The completeness property and the fact that the programmer does not have to write 
any special code, in combination with the promising results from measurements, suggest 
that short cut deforestation based on type-inference is a practical optimisation. 

Takano and Meijer |5ll | noted that the foldr/ build rule for short cut deforestation has 
a dual. This is the destroy /unfoldr rule used in Zip Fusion |48l |. which has some interesting 
properties: it can remove all argument lists from a function which consumes more than one 
list. The method described by Svenningsson removes all intermediate lists in zip [l..n] [l..n], 
addressing one of the main criticisms against the foldr/build rule. The technique can also 
remove intermediate lists from functions which consume their lists using accumulating pa- 
rameters, which is usually a problematic case. The destroy /unfoldr rule is defined as: 

destroy g (unfoldr psi e) = g psi e 

The Zip Fusion method is simple, and can be implemented in the same way as short 
cut deforestation. It still suffers from the drawback that the programmer or compiler writer 
has to make sure the list traversing functions are written using destroy and unfoldr. 

In more recent work Coutts et al. \t\ have extended these techniques to work on func- 
tions that handle nested lists, list comprehensions and filter-like functions. 

8. Conclusions 

We have presented a positive supercompiler for a higher-order call-by-value language 
and proven it correct with respect to call-by- value semantics. The adjustments required to 
preserve the termination properties of call-by-value evaluation are new and work well for 
many examples in the literature intended to show the usefulness of call-by-name transfor- 
mations. 

8.1. Future Work. We believe that the linearity restriction of rule R14 is not necessary 
for the soundness of our transformation, but have not yet found a way to prove this. This is 
a natural topic for future work, as is an investigation of whether the concept of an inlining 
budget may be used to control the balance between supercompilation benefits and code size. 

More work could be done on the strictness analysis component of our supercompiler. 
We do not intend to focus on that subject, though; instead we hope that the modular 
dependency on strictness analysis will allow our supercompiler to readily take advantage of 
general improvements in the area. 

The supercompiler described in this article can be said to supersede several of the 
standard transformations commonly implemented by optimizing compilers, such as copy 
propagation, constant folding and basic inlining. We conjecture that this range could be 
extended to include transformations like common subexpression elimination as well, by 
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means of moderately small algorithm changes. An investigation of the scope for such gen- 
eralizations is an important area of future research. 
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Appendix A. Proofs 

We borrow a couple of technical lemmas from Sands [39!], and adapt the proofs to be 
valid under call-by-value: 

Lemma A.l (Sands, p. 24). For all expressions e and value substitutions 9 such that 
h ^ dom{9), if Co 1— J-^ ei then 

letrec h = Xx.ei in [6{eQ)/ z\e <> letrec h = Xx.ei in [hOix)/ z\e 

Proof. Expanding both sides according to the definition of letrec yields: 

{\h.[e{eo)/z]e) {Xn.fix (Xh.Xx.ei) n) <> {Xh.[h6(x)/z]e) {Xn.fix (Xh.Xx.ei) n) 

and evaluating both sides one step 1— )• gives: 

[Xn.fix (Xh.Xx.ei) n/h][6{eo)/z]e <> [Xn.fix (Xh.Xx.ei) n/h][h9{x)/z]e 

From this we can see that it is sufficient to prove: 

[Xn.fix (Xh.Xx.ei) n/h]9eo <^ [Xn.fix (Xh.Xx.ei) n/h]h 9 (x) 
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The substitution 9 can safely be moved out since h ^ dom{9): 

[Xn.fix {Xh.Xx.ei)n/h]9eo <> [Xn.fix {Xh.Xx.ei)n/h]9(hx) 
Performing evaluation steps on both sides yield: 
[Xn.fix (Xh.Xx.ei) n/h]9eo <i\>[Xn.fix (Xh.Xx.ei) n/h]9{hx) 
[Xn.fix (Xh.Xx.ei) n/h]9eo <i\>[Xn.fix (Xh.Xx.ei) n/h]9{{Xn.fix (Xh.Xx.ei) n) x) 
[Xn.fix (Xh.Xx.ei) n/h]9eQ <>[Xn.fix (Xh.Xx.ei) n/h]9(fix (Xh.Xx.ei) x) 
[Xn.fix (Xh.Xx.ei) n/h]9ei <i\>[Xn.fix (Xh.Xx.ei) n/h]9((Xf.f (Xn.fix f n)) (Xh.Xx.ei) x) 
[Xn.fix (Xh.Xx.ei) n/h]9ei <>[Xn.fix (Xh.Xx.ei) n/h]9 ((Xh.Xx.ei) (Xn.fix (Xh.Xx.ei) n) x) 
[Xn.fix (Xh.Xx.ei) n/h]9ei <i\>[Xn.fix (Xh.Xx.ei) n/h]9((Xx.ei)x) 
[Xn.fix (Xh.Xx.ei) n/h]9ei <>lx/x][Xn.fix (Xh.Xx.ei) n/h]9ei 
[Xn.fix (Xh.Xx.ei) n/h]9ei <i\>[Xn.fix (Xh.Xx.ei) n/h]9ei 

The LHS and the RHS are cost equivalent, so by Lemma 15.211 the initial expressions are 
cost equivalent. D 

Lemma A.2 (Sands, p. 25). p'('Dpi{v)Ja,G,p') ^^ letrec /i = Xx.n{v) in p(Vpi{v)Ja^G,p') 

Proof (Similar to Sands tSyj]). By inspection of the rules for 2? [J, all free occurrences of h 
in ^[['7^(v)]n,G,p' must occur in sub-expressions of the form hx. Suppose there are k such 
occurrences, which we can write as 9ihx . . .9khx, where the 9i are just renamings of the 
variables x. So T^lT^{v)}a,G,p' can be written as [9ihx. . . 9khx/zi . . . Zk]e' , where e' contains 
no free occurrences of h. Then (substitution associates to the right): 

p'mnv)h,G,p') = [Xx.n{g)/h]p(vin{v)UG,p') 

<> [Xx.n{g) /h]p([9ihx . . . 9khx/zi . . . Zk]e') 
<> p([9in{g)...9kn{g)/zi...Zk]e) 
<> (bv Lemma lA.ip 

letrec /i = Xx.TZ{ve) in p([9ihx . . . 9i.hx/zi . . . Zk\e) 
= \eirech = Xx.TZ{v)mp(Vln{v)}a,G,p') D 

Lemma A. 3. 7^(letx = ein/) \>s letx = eia.TZ{f) 

Proof. Notice that 7^(let j; = D in/) is a redex, and assume e i— ^'^ v. The LHS evaluates in k 
steps 7^(leta; = ein/) i-)-^ 7^(letx = fin/) i-^ 7l{[v/x]f), and the RHS evaluates in k steps 
let a; = em.TZ{f) i->^ letx = vinTZ{f) i-^ [v/x]TZ{f). Since contexts do not bind variables 
these two terms are equivalent and by Lemma 15.211 the initial terms are cost equivalent. □ 

Lemma A. 4. TZ{letrecg = f ine) C>s letrec (7 = z;in7^(e) 

Proof. Translate both sides by the definition of letrec into TZ{(Xg.e) (Xn.fix (Xg.v)n)) >$ 
(Xg.TZ{e)) (Xn.fix (Xg.v)n). Notice that 7^(n) is a redex. The LHS evaluates in steps 
to TZ{(Xg.e) (Xn.fix (Xg.v) n)) i-^- TZ{[Xn.fix (Xg.v) n/g]e) and the RHS evaluates in steps 
to (Xg.TZ{e)) (Xn.fix (Xg.v) n) 1-^ [Xn.fix(Xg.v)n/g\R.{e). Since our contexts do not bind 
variables these two terms are equivalent and by Lemma 15.211 the initial terms are cost 
equivalent. D 
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Lemma A. 5. 7l{caseeoi{pi -^ Cj}) >s caseeofjpj — )• 7l{ei)} 

Proof. Notice that 7?.(case D oi{pi — )• ej}) is a redex, and assume e i-^ rij. The LHS 
evaluates in k steps 7^(caseeof {pj — )• ej}) i-?''^ 7^(casenj of {pj — )■ ej}) iH^ TZ{ej), and the 
RHS evaluates in k steps caseeoi{pi -^ Tl{ei)} ^ casenjofjpj -^ 7l{ei)}TZ{f) H- TZ{ej). 
Since these two terms are equivalent the initial terms are cost equivalent by Lemma [5.211 □ 

We set out to prove the main theorem about total correctness: 

Theorem A. 6 (Total Correctness). LetlZ{e) he an expression, and p an environment such 
that 

• the range of p contains only closed expressions, and 

• fv{TZ{e)) n dom{p) = %, and 
thenUie) >s p{H4n,G,p)- 

We reason by induction on the structure of expressions, and since the algorithm is total 
(Lemma I5.12p this coincides with inspection of each rule. 

A.l. Rl. We have that p(2^M7^,G,p) = p(J^{n)), and the conditions of the proposition 
ensure that fv{TZ{n)) n dom{p) = 0, so p{JZ{n)) = lZ{n) This is syntactically equivalent to 
the input, and we conclude TZ{n) >s p(2^M7^,G,p)• 

A. 2. R2. We have that p{T^\x\ti^g ,p) = p(J^{x)), and the conditions of the proposition 
ensure that fv{JZ{x)) n dom{p) = 0, so p{TZ{x)) = lZ{x) This is syntactically equivalent to 
the input, and we conclude TZ{x) >s p{V\x\ti^g ,p) ■ 

A. 3. R3. 

A.3.L Case: (1). 

Suppose 3h.p{h) = \x.lZ{g) and hence that T^\Tl.{g)\n,G,p = hx. 

The conditions of the proposition ensure that x PI dom{p) = 0, so p{'DpZ{g)}fj^G,p) = 
p(hx) = {Xx.TZ{g))x. However, TZ{g) and {Xx.TZ{g))x are cost equivalent, which implies 
strong improvement, and we conclude 7l{g) >s p{'^P^{9)}a,G ,p) 



A.3.2. Case: (2). 

Suppose 3(/i,t) G p.t < n{g) and that n{g) < t, hence 'Dpl{g)JD,G,p = 'R-io)- 

The term on the RHS is discarded and replaced with a new term higher up in the tree, 
so it does not matter what the term is. 



A.3.3. Case: (3). 
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Suppose 3{h,t) G p.t < n{g) and hence that Vln{g)}a,G,p = Mf}a,G,p/yMfg}a,G,p- 

We have p(P[<7k,G,p) = piPlfUGjVMfgUcp) = [pmlUG,p)/x]pmf9}n,G,p). 
By the induction hypothesis, / >s p(P[/Id G,p) and fg >s p(.1^lfg}a,G,p) and by congruence 
properties of strong improvement (Lemma l5.23l l) TZ{g) >s pi'^lg}'R,G,p)- 

A. 3. 4. Case: (4o.)- Analogous to the previous case. 



A.3.5. Case: (4b). 

lfVln{g)Ja,G,p = p(letrec/i = Ax.P[7^(?;)l□,Gy in^x). 

where p' = pU {h, Xx.TZ{g)) and h ^ (xU dom^p)). We need to show that: 

T^ig) >s p(letrec/i = \x.V\Tl{v)ia,G,p''i^hx) 

Since h,x ^ dom{p) we have that p(letrec /i = Xx.T>\TZ{v)\\^^G,p'^''^hx) = letrec/i = 

\x.p{Vln{v)}a^G,p') in/i^- 

7?. is a reduction context, hence ^.{g) i— t-^ TZ{v). By Lemma [A. II we have that letrec h = 
Xx.TZ{v) in TZ{ge) ^^ letrec /i = Xx .TZ{v) in hx . Since h ^ fv{TZ{g)) this simplifies to 
T^ig) ^^ letrec /i = Xx .TZ{v) in hx . It is necessary and sufficient to prove that 

letrec /i = Xx.TZ{v)inhx >s letrec /i = Xx.p{T>lTZ{v)'l^^G^p')inhx 
By Theorem 15.271 it is sufficient to show: 

letrec h = Xx.TZ{v) inTZ{v) \>s 

letrec /i = Xx.7l{v) in p{V\n{v)]^<^^G,p') 

By Lemma [ A . 2 1 and letrec /i = Xx.TZ{v)inTZ{v) <> T^{v), this is equivalent to showing that 

n{v) >s p'{vin{v)h,G,p') 

Which follows from the induction hypothesis, since it is a shorter transformation. 

A.3.6. Case: (4c). We have that p(Vlg}Ti,G,p) = p{T^P^{v)}a,G,p)- By the induction hy- 
pothesis 7l{v) >s Pi'^P^{^)}a,G,p)^ and since TZig) ^ T^iv) it follows from Lemma [5.231 4 
that n{g) >s p{Vlgjn,G,p)- 

A. 4. R4. We have that piT^lkeJ^^cp) = p(^^[eln,G,p), and the conditions of the propo- 
sition ensure that fv{ke) r\dom{p) = 0, so p{kV\e\^^G,p) = ^p(^[^ln,G,p)- By the induc- 
tion hypothesis, e \>s /o(^[[eln,G,p)) and by congruence properties of strong improvement 
(Lemma (523)1) ke >, piPlkelo^G^p)- 

A. 5. R5. We have that p{T>\xe\'ji^G ,p) = p{TZ{x'D\e\i^^G,p))-, and the conditions of the 
proposition ensure that fv{lZ{xe))r\dom{p) = 0, so p{lZ{x'D\e\i^^G ,p)) = TZ{'^ p{T^Win,G,p)) ■ 
By the induction hypothesis, e >s p[T^Win,G,p)^ and by congruence properties of strong 
improvement (Lemma 15.231 1) lZ{xe) >s p{T>\xe\'ji^G ,p) ■ 
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A. 6. R6. We have that p{T>\Xx.e\^^G,p) = /'('^^•^Hn,G,p)i and the conditions of the 
proposition ensure that fv{Xx.e) n dom{p) = 0, so p{Xx.Vlelfj^G,p) = ^x-p(T^ie}a,G,p)- 
By the induction hypothesis, e >s p(J^ls}a,G,p), and by congruence properties of strong 
improvement (Lemma 15.231 1) Xx.e >s piT^l^x-eJu^cp)- 

A. 7. R7. We have that piVlrii n2ln,G,p) = p{1^P^{n)}a,G,p)- By the induction hypoth- 
esis, TZ{n) >s p{'^P^{^)}n,G,p), and since TZ{ni © n2) i-^ TZ{n) it foUows from Lemma r5.23l 4 
that 7^(r^l © na) >s p(^[[r^i © n2Jn,G,p)- 

A. 8. R8. 

a) ei ©62 = a: We have that p(Vlei © e2Jn,G,p) = /5(^IIeiln,G,p ©^Mn.G.p), by the given 
conditions fv{n{ei © 63)) n dom(p) = 0, so p{']Z{Vleija,G,p ® Vle2Ja,G,p)) = 
7^(p(P|el]□,G,p) ® p(J^le2Ja,G,p))- By the induction hypothesis ei ^^ p(J^leiJa, G p) an d 
62 ^s /'(^[c2ln,G,p); and by congruence properties of strong improvement (Lemma l5.23l l) 
7^(el © 62) >s piHei © e2k,G,p)- 

b) ei = n or ei = a: We have piVld © e2l7e,G,p) = P(^l[e2l7^(el®□), G,p) and 7^(el © 62) >s 
p{T^le2ln{ei(Ba),G,p) fohows from the induction hypothesis. 

c) otherwise: We have that piVlei ® e2Jn,G,p) = p{'^leij'ji(a®e2),G,p) and TZ{ei (B 62) >s 
p(J^lei}Ti{Q^e2),G,p) follows from the induction hypothesis. 

A.9. R9. We have that p{Vl{\x.f)eJTi^G,p) = p(^['^(let^ = ein/)]]n,G,p). Evaluating 
the input term yields: TZ{{Xx.f)e) 1— t-^ TZ{{Xx.f)v) i-)- TZ{[v/x]f), and evaluating the input 
to the recursive call yields: TZ{letx = emf) >—?■''' 7^(letlE = win/) 1— )• TZ{\v/x]f). These 
two resulting terms are syntactically equivalent, and therefore cost equivalent. By Lemma 
15.211 their ancestor terms are cost equivalent, TZ{{\x.f)e) <> 7^(letlE = ein/), and cost 
equivalence implies strong improvement. By the induction hypothesis 7^(let x = ein/) >s 
p(P[7^(letI^ = ein/)ln,G,p), and therefore 7^((Ax./)e) >, p{Vl{\x.f)e}Ti,G,p)- 

A.IO. RIO. We have p{Vlee'\n,G,p) = p(2^N7e(ne'),G,p) and 7^(ee') >, p{VMn{ae'),G,p) 
follows from the induction hypothesis. 

A.ll. Rll. We have that p(V\[eix = nmf}n,G,p) = p{Vln{[n/x\f)\a,G,p)- By the in- 
duction hypothesis TZ{[n/x]f) >s p{'^V^{[^/^]f)\n,G,p)i and since 7^(let x = nin/) i-)- 
Tl{[n/x\f) it follows from Lemma [5.231 4 that 7^(letx = nin/) \>s p{V\\.etx = nmf]^Ti^G,p)- 

A.12. R12. We have that p(P [let x = yin/]7^, g,p) = p{Vln{[y/x]f)}a,G,p)- By the in- 
duction hypothesis TZ{[y/x\f) >s p{'^V^{[y/x\f)\u,G,p)-, and since TZ(\.eix = ym. f) <> 
T^iiy/xjf) it follows that 7^(letx = y'mf) >« p(P|letx = y in/]7^,G,p)- 



A.13. R13. 
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A.13.1. Case: x G strict{f). We have p{Vlletx = emfj'R„G,p) = p(2?[7^([e/x]/)l□,G,p)• 
Evaluating the input term yields TZ{letx = einf) i-)-'" 7^(letx = vinf) i— )• TZ{[v/x]f) i-^^ 
£{v), and evaluating the input to the recursive call yields: TZ{[e/x]f) i— )•* £{e) i-4-^ S{v). 
These two resulting terms are syntactically equivalent, and therefore cost equivalent. By 
Lemma [5.211 their ancestor terms are cost equivalent, 7^(letx = ein/) <> TZ{[e/x]f), and 
cost equivalence implies strong improvement. By the induction hypothesis TZ{[e/x]f) >s 
p{'DpZ{[e/x]f)}a,G,p), and therefore 7^(letx = ein/) >s /^(Dpetx = einfj-jz^cp)- 

A. 13.2. Case: otherwise. We have that p(I'[[let2; = ein /Jt^^g^p) = 

p{letx = T>lelfj^G,p'^^T^P^{f)}a,G,p)^ and the conditions of the proposition ensure that 

^(7^(letx = ein f))ndom{p) = 0, so p(letx = VleJa,G,pinVpZ{f)ja,G,p) = 

letx = p(D[[e]o,G,p)in/o(^I^(/)ln,G,p)- B y th e induction hypothesis e >s p(J^leJa,G,p) 

and TZ{f) >s p(J^P^{f)}a,G,p)- By Lemma [A. 31 the input is strongly improved by letx = 

einTZ{f), and therefore 7^(letx = ein/) >s p(Pjletx = ein fj-ji^cp)- 

A. 14. R14. We have that p{'Dl\etrec g = vinej-ji^cp) = p{ietrec g = v'm.VlTZ{e)}a^G,p), 
and the conditions of the proposition ensure that /u(7^(letrec ^f = ?;ine)) n dom{p) = 0, 
so p{letrecg = vm'DpZ{e)}^^G,p) = ietrecg = vinp{I)pZ{e)}fj^G,p)- By the induction 
hypothesis 7^(e) >s p{'^P^{^)}a,G,p)- By Lemma [A. 41 the input is strongly improved by 
letrec^f = v in7^(e), and therefore 7^(letrec5 = t^ine) ^^ />(Ppetrec(7 = vinel-ji^cp)- 

A. 15. R15. We have p{Vlcasexoi{pi ^ ei}JTz,G,p) = p{casexoi{pi -^ D|7^(ej)]]□,G,p}), 
and the conditions of the proposition ensure that fv{TZ{case x oi {pi — ?> ej})) n dom{p) = 0, 
so p{cas,e X oi {pi — > V\n{ei)\^^G,p}) = casexofjpj — )■ p(^[[^(ei)]]n,G,p)}- By the induction 
hypothesis TZ{ei) >s p(J^V^{^'i)\n,G,p)- Using Lemma [A. 51 the input is strongly improved 
by casexofjpj -^ TZ{ei)}, and therefore TZ{case x o£ {pi — )• ej}) >s 
p{Vlcasexo{{pi -^ ei}JTz,G,p)- 

A. 16. R16. We have piVfcase kj e oi {pi — ^ ei}}'ji,G,p) = p(T>lTZ{\etxj = e in ej)]]n_G,p) • 
Evaluating the input term yields TZ{case kj e oi {pi — > ej}) i— >'" TZ{case kj v oi {pi -^ ej}) i— )• 
TZ{[v/xj]ej), and evaluating the input to the recursive call yields Tl{letxj = eine^) i->-^ 
7^(letxj =l;inej) i-> Tl{\v/xj]ej). These two resulting terms are syntactically equivalent, 
and therefore cost equivalent. By Lemma 15.211 their ancestor terms are cost equivalent, 
TZ{case kj e oi {pi ^ ei}) <^ 7^(let Xj = eine^), and cost equivalence implies strong im- 
provement. According to the induction hypothesis TZ{letxj = eine^) >s 
p{'DlTZ{letxj = einej)|g^G,p)) and therefore TZ{case kj e o{ {pi -^ ej}) >s 
piVlcase kjeoi{pi -^ ej}]7e,G,p)- 

A.17. R17. We have that /)(P|casenj of fe -^ ei}]]),G,p = p{'^P^{ej)}n,G,p)- By the in- 
duction hypothesis Tl{ej) >s Pi'^P^{^j)}a,G,p), and since 7^(casenj of {pi — > e^}) i-)- TZiej) 
it follows from Lemma 15.231 4 that IZicaserij of {pj -^ ej}) \>s 
p{Vlcasenjoi{pi -> ej}]7^,G,p)- 
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A.18. R18. We have that p(P[[caseaof {pi -^ ei}JTi,G,p) = 

p{caseT>lalfj^G,poi{pi — )• ^[[^(ej)]n,G,p})) and the conditions of the proposition ensure that 
fv{n{caseaoi{pi ^ ei})) n dom{p) = 0, so p(caseD[[a]]n,G,pOf fe -^ P[7^(ei)]]□,G,p}) = 
case pCDlaJa^Cp) oi {pi -^ piT^P^{ei)ia,G,p)}- By the induction hypo thesis a >s 
p(D[a]]p^G,p) and TZ{ei) >s p{'^P^{^i)}a,G ,p) and by Lemma IA.5I the input is strongly 
improved by case a of {pj —?■ TZ{ei)}, and therefore 7^(caseaof {pj — )• Cj}) ^^ 
p(P[caseoof{pj -> ei}]l7^,G,p)- 

A.19. R19. We have that p{Vlcase e oi {pi ^ ei}JTi,G,p) = /o(^H7^(case^offe^eJ),G,p) 
and 7^(caseeof{pj -^ ej) >s P(^H7^(casenof{pi^ei}>,G,p) fo^ows from the induction hy- 
pothesis. 

A. 20. R20. We have that p{T^leln,G,p) = ^("^(6)), and the conditions of the proposition 
ensure that fv{TZ{e)) PI dom{p) = 0, so p{TZ{e)) = TZ{e) This is syntactically equivalent to 
the input, and we conclude TZ{e) >s p{'^\^\n,G ,p) ■ 
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